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COPING  WITH  THE  INFORMATION  EXPLOSION  PROVIDED 


BY  MODERN  CHEMICAL  INSTRUMENTATION 


Sam  P.  Perone 

Chemistry  &  Materials  Science  Department 
Lawrence  Livermore  National  Laboratory 
Livermore,  CA  9455U 


Modern  chemical  instrumentation  is  capable  of  generating  enormous 
amounts  of  data  in  very  short  periods  of  time.  It  is  clear  that  a  major 
^■fri^of  scientists  for  the  near  future  is  to  develop  techniques  to 
utilize  nifrre^effectively  this  capability,  in  order  to  avoid  the  typical 
dilemma  of  being  buried  in  data  with  little  or  no  perspective  of  the 
information  content.  Thus,  there  are  three  key  developments  that  must  be 
pursued:  definition  of  "information  content";  identification  of  methods 
'  to  correlate  instrumental  parameters  with  information  content;  and 
development  of  tools  for  the  instrumental  enhancement  of  information 
content  and  the  efficient  extraction  of  information  from  data.  These 
developments  snould  allow  the  evolution  of  "smart  instruments",  perhaps 
guided  by  artificial  intelligence  principles.  This  paper  will  describe 
some  of  the  principles  and  tools  that  have  already  been  developed,  and 
will  identify  the  areas  where  work  needs  to  be  done. 

Modern  instrumentation  for  chemical  analysis,  because  of  the 
incorporation  of  digital  computer  systems,  allows  the  generation  and 
collection  of  immense  amounts  of  data.  This  is  facilitated  by  computer 
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control  of  experimental  variables  and  high-speed  collection  of  multiple 
channels  of  data.  This  in  turn  allows  complex  measurement  principles  to 
be  implemented,  with  correspondingly  complicated  multivariate  analysis. 

Unfortunately,  tne  data  explosion  that  has  accompanied  the  evolution 
of  modern  chemical  instrumentation  has  not  provided  a  corresponding 
information  explosion.  This  is  because  relatively  little  attention  has 
been  paid  to  the  development  of  techniques  for  optimization  of 
information  content,  or  for  enhancement  and  extraction  of  information. 

It  is  not  uncommon  to  observe  a  scientist  buried  in  a  data  printout  from 
an  experiment,  manually  scanning  columns  of  data,  calculator  in  hand, 
attempting  to  extract  useful  information. 

It  is  time  to  turn  our  attention  to  developing  more  effective  methods 
for  obtaining  information  from  complex  experimental  systems.  The  first 
step  involves  tne  definition  of  generic  concepts  of  information  content 
which  are  independent  of  the  specific  instrumental  system.  This  is  a 
task  which  has  been  surprisingly  neglected  in  the  past.  The  very 
simplest  concepts  which  must  be  defined  include: 
o  informational  goals 

o  information  content 

o  information  enhancement 

The  next  step  is  to  apply  the  basic  principles  of  information  theory, 
signal  processing  theory,  multivariate  data  interpretation,  and  adaptive 
i nstrumental  control  in  order  to  enhance  and  effectively  extract 
i nformation. 

INFORMATION  GOALS 

The  primary  requirement  in  the  process  of  information  enhancement  is 


to  define  the  informational  goal(s)  associated  with  a  set  of  experimental 
measurements.  Equally  important  is  the  definition  of  an  appropriate 
measure  of  the  degree  to  which  the  informational  goal  is  achieved.  Some 
generic  qualitative  informational  goals  and  their  respective  figures  of 
merit  might  be: 

GOAL  FIGURES  OF  MERIT 

concentration  accuracy/precision 

resolution  peak  separation/peak  width 

sensitivity  detection  limit/response  slope 

matrix  effects  linearity/interference  effects 

In  addition,  it  is  possible  to  define  qualitative  informational 
goals.  These  might  include: 

o  identification  of  chemical  components 

o  classification  of  materials/properties 

o  establishment  of  chemical  mechanism. 

Corresponding  figures  of  merit  for  the  qualitative  informational 
goals  can  be  defined  in  terms  of  statistical  accuracy  by  evaluation  with 
systems  of  known  properties. 

INFORMAT I OH  CONTENT 

This  concept  is  one  of  the  most  difficult  to  quantitate.  There  are 
some  relatively  explicit  definitions  of  information  content  for 
electronic  communications.  (For  example,  the  Nyquist  theorem  defines  the 
minimum  sampling  rate  required  in  order  to  preserve  the  maximum  frequency 
information  in  a  periodic  signal.  And,  the  relationships  between  digital 
encoding  formats  and  information  content  of  a  data  base  can  be 
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quantitated.)  However,  for  the  general  problem  of  evaluating  the  results 
of  instrumental  measurements  of  chemical  systems,  the  definitions  for 
information  content  of  data  are  very  clear. 

One  goal  of  our  research  program  is  to  develop  explicit  and 
quantitative  definitions  of  information  content  which  may  be  useful  for 
chemical  instrumentation  systems.  These  will  be  based  on  the  principles 
of  information  theory,  sampling  theory,  and  signal  processing  theory.  At 
this  time,  however,  we  can  describe  an  empirical  approach  to  evaluation 
of  information  content  which  we  have  found  very  useful. 

This  approach  involves  the  following  steps: 

o  define  the  "desired  information"  (informational  goal(s)) 

o  define  a  figure  of  merit  for  goal  achievement  (e.g.,  accuracy, 
precision,  reliability,  etc.) 

o  empirically  determine  "information  content"  from  the  relationship 
LINF0.  GOAL]  =  ^[INFO.  CONTENT]  (1) 

From  the  above  statement  the  information  content  of  a  chemical 
measurement  system  can  be  evaluated  by  studying  the  effects  of 
experimental  factors  on  the  degree  of  achievement  of  the  informational 
goal(s).  This  is  elaborated  below. 

information  enhancement 

An  empirical  procedure  can  be  defined  for  the  enhancement  of 
information  content.  First,  it  must  be  recognized  that  the  achievement 
of  desired  informational  goal(s)  depends  not  only  on  the  inherent 


information  content  of  data,  but  also  on  the  data  management  and  analysis 
procedures.  This  is  expressed  in  Equation  (2): 

[INFO.  GOAL]  =  f [CONTENT,  MGMT,  ANALYSIS]  (2) 

Thus,  to  examine  the  relationship  between  information  content  and 
experimental  factors,  it  is  necessary  to  maintain  consistent  data 
management  and  analysis  procedures.  Then,  one  can  assume  a  direct 
relationship  between  the  achievement  of  informational  goals  and 
information  content  as  implied  in  Equation  (1). 

A  study  designed  to  determine  the  effects  of  experimental  factors  on 
information  content  might  be  based  on  the  relationship  defined  by 
Equation  (3): 


[INFO.  CONTENT]  *  "j  [MEASUREMENT  PRINCIPLES, 

EXPTL  DESIGN, 

EXPTL  PARAMETERS]  (3) 

Procedurally,  one  could  vary  any  of  the  experimental  factors  in 
Equation  (3)  and  evaluate  the  effects  on  information  content  under 
conditions  where  Equation  (1)  applies. 

In  order  to  clarify  the  general  concepts  defined  in  the  above 
sections,  the  following  sections  will  describe  an  experimental  study 
which  followed  those  principles  in  order  to  achieve  specified 
informational  goals. 


ELECTROCHEMICAL  STRUCTURAL  AND  ACTIVITY  CLASSIFICATIONS 

The  classification  of  chemical  structure  using  electrocnemical 
techniques,  is  a  challenging  problem.  Voltammetric  responses  lack  fine 
structure  and  probably  will  never  compete  with  spectroscopic  metnods  in 
qualitative  analysis.  The  complex  dependence  of  an  electrochemical 
response  on  many  variables,  and  theoretical  problems  in  relating 
structure  to  electrochemical  activity,  make  qualitative  voltammetric 
analysis  even  more  formidable. 

Even  though  the  difficulties  in  qualitative  electroanalysis  are 
great,  the  rewards  of  developing  a  reliable  means  of  structural 
identification  through  electroanalysis  would  also  be  great.  Due  to 
recently  developed  miniaturization  techniques,  electrodes  are  the  most 
promising  probes  of  in  vivo  chemical  species.  Carbon  fiber  electrodes 
may  be  implanted  within  a  single  cell  or  neuron  (1).  Electrochemical 
detectors  in  liquid  chromatography  are  becoming  very  important  because  of 
their  high  sensitivity  and  selectivity.  Quantities  of  electroactive 
material  in  the  picograW'-range-Jaaye. UdCTr;»nalyzeu.  Osteryoung,  et  al. 

(2)  have  demonstrated  the  feasibility  of  scanning  the  potential  of  a 
liquid  chromatographic  electrochemical  detector,  so  the  development  of 
qualitative  voltammetric  methods  would  of  the 

character!’ zation  of  eluants  that  are  1000  times  less  concentrated  than 
those  which  can  be  analyzed  by  spectroscopic  techniques. 

Li near-free-energy  relationships  have  generally  been  the  most  useful 
expressions  for  relating  structure  to  electrochemical  activity  in  the 
past.  A  substituent  group  will  have  a  characteristic  effect  on  the  free 
energy  of  an  electrochemical  reaction  occurring  in  its  vicinity.  This 
effect  may  occur  through  electron  withdrawal,  electron  donation,  or  it 
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may  be  steric  in  nature.  In  any  case,  the  effect  may  be  quantified 
through  the  use  of  hammett  substituent  constants.  For  a  given  class  of 
electrochemical  reactions,  there  will  be  a  linear  relationship  between 
and  the  substituent  constants  a  (3). 

There  are  two  main  problems  in  the  use  of  1  i near-free-energy 
relationships.  The  first  and  largest  problem  is  the  determination  of  the 
reaction  series  to  which  an  unknown  belongs.  Such  a  deduction  from 

electrochemical  behavior  is  not  straightforwara.  Furthermore,  there  may 
be  several  reaction  series  which  may  be  constructed  for  a  class  of 
compounds  depending  on  solution  conditions.  The  slope  of  the  vs 
a  plot  would  be  different  at  high  pH's  due  to  a  change  in  the  mechanism 
of  reduction. 

The  second  main  problem  is  that  there  is  often  not  enougn 
separation  for  different  substituents  or  substituent  combinations  to 
allow  for  confidence  in  identification,  especially  when  experimental 
reproducibility  is  low  due  to  uncontrolled  matrix  effects.  The 
consideration  of  more  information  than  Ey2  would  clearly  be  helpful. 

Because  pattern  recognition  is  well  suited  to  the  consideration  of 
large  amounts  of  information  and  to  making  use  of  obscure  relations,  we 
have  applied  it  to  chemical  structure  identification  from  electrochemical 
data.  The  sain  questions  have  been  what  data  should  be  collected  and  how 
much? 

Burgard  and  Perone  (4),  used  staircase  voltammetry  to  analyze  29 
compounds  belonging  to  four  different  electroacti ve  group/skeleton 
combinations.  The  classes  examined  were  aromatic-ni tro,  aliphatic-nitro, 
aromatic-aldehyde  and  aromatic-aliphatic-ketone.  Fortuitously  these 
classes  were  almost  completely  separated  on  the  basis  of  peak  potential; 


but  this  feature  alone  cannot  be  considered  sufficient  for  many 
identification  problems.  Thus,  the  voltammograms  were  examined  for  any 
shape  infonaation  which  might  characterize  a  particular  electroactive 
group  or  the  skeleton  to  which  it  was  attached.  It  was  found  that  the 
change  in  peak  shape  with  scan  rate  produced  fair  classifications  (70% 
correct),  but  that  complete  separation  of  the  classes  was  not  possible 
for  the  experimental  conditions  and  compounds  which  were  chosen.  The 
results  suggested  that  the  information  content  of  the  electrochemical 
data  base  should  be  increased  for  more  reliable  structural 
classifications. 

The  work  described  below  by  Byers,  Freiser,  and  Perone  (5,6) 
represents  an  attempt  to  define  quantitatively  the  information  content  of 
electroanalytical  voltammetric  data  with  regard  to  structural  and 
activity  classifications.  The  general  principles  defined  in  the 
i  ntroductory  sections  of  this  paper  were  followed. 

RESULTS  AND  DISCUSSION 

Ichise,  Yamagishi  and  Kojima  (7-9)  have  proposed  tne  simultaneous 
determi nation  of  complete  E-i-c  and  C^-E-c  patterns  (c  =  surface 
concentration)  and  have  published  several  papers  on  instrumentation  and 
data  compression  algorithms  for  reaching  that  goal.  E-i-c  patterns  were 
generated  by  applying  a  pseudo-random  waveform  to  the  cell  and  monitoring 
the  current  response.  The  surface  concentration  of  the  depolarizer  was 
calculated  from  the  current  in  an  analog  fashion  with  an  "s~  module" 
which  eliminated  the  effect  of  diffusion.  CQ^  was  obtained  by  applying 
a  high  frequency  10  mV  sinusoidal  wave  to  the  cell  and  measuring  the 
amplitude  of  the  90  degrees  out-of-phase  component  of  the  current. 
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The  idea  of  obtaining  double-layer  capacity  information  may  be 
fruitful.  The  capacitance  of  the  double  layer  is  dependent  on  adsorption 
of  the  analyte,  and  the  strength  and  potential  dependence  of  adsorption 
may  indicate  the  presence  of  certain  functional  groups  (10). 
ir-electron  interaction  between  adsorbed  molecules  and  the  electrode 
surface  has  a  characteri Stic  influence  on  the  adsorption  behavior  of 
organic  substances  (10),  and  specific  interactions  between  the  analyte 
and  some  other  molecule  or  ion  within  the  double  layer  may  also  be 
helpful  in  identification  (11,12).  Some  adsorbed  organics  will  inhibit 
the  reduction  of  metal  ions,  while  others,  througn  the  so  called 
"cap-pai  r"  effect  will  accelerate  reductions  (13). 

The  use  of  a  potential -step  tecnnique  such  as  cyclic  staircase 
voltammetry  represents  a  simple  alternative  to  Ichise's  method  (8)  of 
obtaining  information  on  both  adsorption  and  electron  transfer  kinetics. 
The  current  decay  immediately  after  a  step  is  primarily  capacitive  while 
current  at  later  times  is  almost  totally  due  to  electron  transfer 
reactions.  Thus,  by  measuring  the  current  at  several  times  during  each 
step  and  by  changing  the  scan  rate,  information  on  both  the  kinetics  of 
the  electrode  process  and  the  differential  capacity  can  be  obtained  with 
a  single  sweep. 

As  is  true  with  cyclic  linear  sweep  voltammetry,  the  reversal  of  the 
scan  is  important  in  detecting  chemical  reactions  which  succeed  the 
electron  transfer  step.  Immediate  repetition  of  a  cyclic  scan  may  detect 
products  which  have  been  generated  in  the  reverse  scan  of  the  first  cycle. 

One  additional  parameter  which  can  be  explored  is  the  "drop  hang 
time".  This  refers  to  the  time  period  between  the  creation  of  a 
stationary  mercury  drop  and  the  beginning  of  the  first  staircase  scan. 
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During  the  waiting  time,  a  potential  can  be  applied.  This  variable  was 
investigated  in  our  work  to  see  if  there  was  any  class  specific 
information  in  the  kinetics  of  adsorption. 


Another  source  of  structural  information  is  the  electrochemical 
response  of  the  analyte  to  chemical  perturbations.  Changes  in  solution 
conditions  have  been  useful  in  classical  studies  of  structure-activity 
relationships.  Exploration  of  a  variety  of  solutions  will  help  define 
the  best  conditions  for  particular  classification  problems. 

All  of  the  experimental  and  solution  variables  which  have  been 
examined  systematically  in  our  classification  studies  are  listed  in 
Table  1.  The  determination  of  the  effect  of  each  of  the  seven  variables 
is  difficult  without  good  experimental  design.  To  characterize  all  main 
effects  and  all  interactions  one  could  arrange  the  experiments  by  a 
factorial  design  (14).  For  the  seven  variables  considered  here,  12t>  runs 
would  be  needed  for  each  compound.  The  large  number  of  runs  can  be 
avoided  by  using  a  saturated  fractional  factorial  design  (15)  in  which 
the  main  effect  of  all  seven  variables  can  be  investigated  in  only  eight 
experiments.  By  running  a  second  fraction,  in  which  all  variable  levels 
have  been  reversed  from  their  state  in  the  first  fraction,  all 
confounding  between  the  main  effect  of  variables  and  the  interaction  of 
two  variables  will  be  eliminated.  Higher  order  interactions  (the 
interaction  of  three  or  more  variables)  may  still  be  confounoed  with  the 
main  effects,  but  in  most  cases  such  interactions  are  relatively  small  in 
magnitude. 

In  our  work  (5,6),  a  fractional  factorial  design  was  used  as 
described  above.  In  addition,  one  of  the  experiments  run  early  in  the 
analysis  of  each  compound  is  repeated  near  the  end  of  the  analysis  to 
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TABLE  1.  Variable  levels  for  factorial  design  to  study  structural  effects  on 
voltarametric  data. 


VARIABLE 

NUMBER  VARIABLE  LOU  LEVEL  (-)  HlUh  LEVEL  (+) 


determine  instrumental  precision  and  to  detect  any  decomposition  of  the 
sample.  This  makes  a  total  of  17  vol tami.iograms  which  must  be  taken  for 
each  compound.  These  experiments  yield  17  current-voltage  and  17 
differential  capacity  curves  for  each  compound. 

Graphical  analysis  of  the  error  involved  in  the  calculation  of 
variable  effects  was  done  for  several  nitroaromatics  and  nitroui phenyl 
ethers  (5).  It  was  discovered  that  all  of  the  variables  chosen  for  study 
had  significant  effects  on  the  Faradaic  responses  of  the  compounds 
examined.  The  magnitudes  of  the  effects  and  the  shapes  of  the  effect 
curves  were  quite  different,  indicating  that  redundant  information  was 
not  recorded.  All  of  the  variables  also  had  a  significant  effect  on  the 
differential  capacity  curves  of  strongly  adsorbed  species,  but  some  of 
the  effects  could  not  be  distinguished  from  noise  for  more  weakly 
adsorbed  compounds.  Only  pH,  number  of  cycles  and  %  ethanol  had  a 

significant  effect  on  the  capacitance  response  of  both  weakly  anu 
strongly  adsorbed  organics. 

Since  the  variables  chosen  and  the  levels  over  which  they  were 
changed  seemed  to  be  appropriate  for  most  compounds  from  a 
signal-to-noise  perspective,  the  variable  effects  were  further  examined 
for  any  information  which  might  be  useful  in  structural  classifications. 
Forty-five  compounds  representing  three  major  structural  classes  were 
chosen,  and  features  derived  from  the  variable  effects  were  tested  for 
predictive  ability  (6).  Class  1  consisted  of  19  nitroaromatics 
containing  a  single  benzene  ring;  Class  2  contained  nine 
nitrodiphenylethers,  ana  Class  3  consisted  of  17  azo  compounds.  Trie 
classes  were  completely  overlapped  in  potential,  and  all  compounds  were 
reduced  by  the  same  number  of  electrons,  so  the  identification  of  the 
classes  fron  their  voltammetric  behavior  was  not  a  trivial  problem. 
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In  terms  of  the  concepts  defined  in  the  introductory  sections,  the 
i nformational  goal  of  this  study  was  "structural  classification".  The 
figure  of  merit  for  achievement  of  this  goal  was  "classification 
accuracy"  for  examination  of  a  data  base  containing  a  large  number  of 
items  of  known  class.  The  experimental  parameters  were  varied 
systematically  according  to  a  fractional  factorial  design.  Ultimately, 
it  was  desired  to  establish  what  combi nation(s)  of  experimental 
parameters  produced  electroanalytical  data  with  the  highest  information 
content, using  the  figure  of  merit  defined  above. 

The  pattern  recognition  analysis  reveal ea  that  all  of  the  variables 
produced  structural -specific  information.  Most  of  the  information  was 
found  iri  the  Faradaic  responses.  Changes  in  the  Faradaic  responses  with 
the  number  of  cycles  gave  the  highest  classification  accuracy  of  93.3%. 
Scan  rate  cnanges  yielded  89%,  while  pH,  surfactant  and  drop  hang  time 
all  produced  classification  accuracies  of  84%.  Changes  in  Faradaic 
response  with  %  ethanol  and  sampling  time  appeared  to  contain  the  least 
structural  information,  giving  classification  accuracies  of  66.7  and 
75.6%,  respectively.  As  was  expected  from  the  signal -to-noise  analysis, 
the  effects  of  the  several  variables  on  the  capacitive  responses  were 
much  poorer  structural  predictors.  Classification  accuracies  rangeu 
between  6U.0  and  75.6%. 

Although  changes  in  differential  capacity  responses  caused  by  changes 
in  the  experimental  variables  were  not  very  helpful,  the  shapes  of 
differential  capacity  curves  wnich  were  obtained  under  the  same 
experimental  conditions  were  excellent  structural  descriptors.  Using 
shape  features  derived  from  differential  capacity  curves  taken  under  one 
set  of  experimental  conditions,  93.3%  classification  accuracy  was 


achieved.  Four  other  sets  of  experimental  conditions  yielded  over  9U% 
classification  accuracy. 

An  interesting  sidelight  of  the  organic  structural  classification 
study  was  that  herbicidal  activity  could  also  be  predicted  (6).  The 
nitrodiphenylethers  could  be  divided  into  compounds  which  were  strong 
herbicides  and  those  compounds  which  showed  little  or  no  herbicidal 
activity.  Both  Faradaic  ana  capacitive  responses  could  be  used  to 


separate  these  classes  for  over  half  the  experimental  conditions 
examined.  As  was  found  in  the  classification  of  structure,  capacitive 
factorial  features  performed  somewhat  better  than  Faradaic  factorial 
features.  It  also  appeared  that  classifications  of  herbicidal  activity 
using  Faradaic  factorial  features  could  be  improved  considerably  by 
working  at  high  pH  and  without  surfactant  present.  The  information 
content  of  Faradaic  or  capacitive  variable  effects  data  could  be  improved 
by  variations  in  %  ethanol. 

The  ability  of  voltammetric  responses  to  predict  the  herbicidal 
activity  can  be  explained  by  the  mechanism  of  herbicidal  action  for  the 
nitrodiphenylethers.  It  is  thought  that  these  compounds  are  involved  in 
the  initiation  of  destructive  free  radical  reactions  with  the 
phospholipid  molecules  which  make  up  cellular  membranes  (16).  Since  the 
first  step  in  the  reduction  of  aromatics  at  the  mercury  electrode  also 
involves  the  formation  of  radical  species  (17),  some  correlation  between 
herbicidal  activity  and  voltammetric  behavior  is  not  surprising. 


CONCLUSIONS 

The  experimental  study  described  here  illustrates  how  the  application 
of  the  principles  of  information  enhancement  can  significantly  improve 


chemical  analysis.  In  this  case  we  have  established  the  optimum 
conditions  for  obtaining  structural  or  activity  information  from 
voltammetric  electroanalytical  data.  Moreover,  it  is  clear  that  the 
informational  goal(s)  will  dictate  the  most  favoraole  choice  of 
experimental  conditions.  It  is  also  interesting  to  observe  that  the  most 
useful  experimental  conditions  —  such  as  the  enhancement  of  surface 
interactions  —  are  not  necessarily  those  which  are  traditionally  valued 
most  highly  in  voltammetric  studies.  This  result  points  up  another 
valuable  benefit  of  an  objective  systematic  information  enhancement 
study.  Finally,  it  should  be  observed  that  the  principles  and  general 
methodology  described  in  this  work  are  generic  and  should  be  applicable 
to  any  chemical  instrumental  systems. 

This  work  supported  by  the  Office  of  Naval  Research  and  the  U.S. 
Department  of  Energy  Contract  to-74Ub-EU(a-46  Lawrence  Livermore  National 
Laboratory. 
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